Binary coding of speech spectrograms using a deep auto-encoder

نویسندگان

Li Deng

Michael L. Seltzer

Dong Yu

Alex Acero

Abdel-rahman Mohamed

Geoffrey E. Hinton

چکیده

This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Diagnosis of Brucellosis in Rafsanjan City Using Deep Auto-Encoder Neural Networks

Introduction: Brucellosis is considered as one of the most important common infectious diseases between humans and animals. Considering the endemic nature of brucellosis and the existence of numerous reports of human and animal cases of brucellosis in Iran, the incidence of human brucellosis in Rafsanjan city was determined in the last 3 years (2016–2018). The main objective of this study was t...

متن کامل

The Diagnosis of Brucellosis in Rafsanjan City Using Deep Auto-Encoder Neural Networks

متن کامل

Deep Denoising Auto-encoder for Statistical Speech Synthesis

This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and...

متن کامل

Updating the silent speech challenge benchmark with deep learning

The 2010 Silent Speech Challenge benchmark is updated with new results obtained in a Deep Learning strategy, using the same input features and decoding strategy as in the original article. A Word Error Rate of 6.4% is obtained, compared to the published value of 17.4%. Additional results comparing new auto-encoder-based features with the original features at reduced dimensionality, as well as d...

متن کامل

Using an autoencoder with deformable templates to discover features for automated speech recognition

In this paper we show how we can discover non-linear features of frames of spectrograms using a novel autoencoder. The autoencoder uses a neural network encoder that predicts how a set of prototypes called templates need to be transformed to reconstruct the data, and a decoder that is a function that performs this operation of transforming prototypes and reconstructing the input. We demonstrate...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Binary coding of speech spectrograms using a deep auto-encoder

نویسندگان

چکیده

منابع مشابه

The Diagnosis of Brucellosis in Rafsanjan City Using Deep Auto-Encoder Neural Networks

The Diagnosis of Brucellosis in Rafsanjan City Using Deep Auto-Encoder Neural Networks

Deep Denoising Auto-encoder for Statistical Speech Synthesis

Updating the silent speech challenge benchmark with deep learning

Using an autoencoder with deformable templates to discover features for automated speech recognition

عنوان ژورنال:

اشتراک گذاری